Cancer Epidemiology, Biomarkers & Prevention
● American Association for Cancer Research (AACR)
Preprints posted in the last 7 days, ranked by how well they match Cancer Epidemiology, Biomarkers & Prevention's content profile, based on 17 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.
Bey, G. S.; Bowen, M. B.; Wu, S.; Boykin, M.; Bernard, L.; Zhang, Q.; Melendez, B.; Celestino, J.; Batsis, J. A.; Sun, C.; Lin, F.-C.; Yates, M. S.
Show abstract
Background: Endometrial cancer incidence and mortality are increasing, particularly among Black women and for aggressive subtypes. Allostatic load (AL), a composite measure of physiologic dysregulation across metabolic, cardiovascular, and immune systems, varies by racial category and tumor subtype in other cancers. Endometrial cancer is strongly associated with obesity, and it is unknown whether AL scores maintain sufficient heterogeneity to evaluate differences across subgroups or with clinical outcomes. Objective: To describe the performance of AL scoring in endometrial cancer patients and examine associations with tumor characteristics (grade/histology) and survival outcomes. Methods: We evaluated AL among 398 participants newly diagnosed with endometrial cancer. AL score was calculated by assigning 1 point for each ''high-risk'' value (by clinical reference range or distribution-based) for 15 biologic variables for vital signs, anthropometrics, blood-based biomarkers, and medical comorbidities. Results: Distribution-based thresholds for variables were used to preserve heterogeneity in this obesity-dominant context. Overall, 68.7% of Black women had high AL compared to White (56.7%), Hispanic (56.7%), and other race (32.3%) women. Decision tree analyses revealed grade-dependent associations between AL and survival. For women with low-grade tumors, higher AL was associated with poorer overall survival. For high-grade tumors, intermediate AL ([≥]4, <8) were associated with shortest overall survival. Black women with low-grade disease experienced shorter progression-free survival regardless of AL. Conclusions: AL scoring maintains heterogeneity despite high obesity prevalence in endometrial cancer. Varying relationships between AL and survival by tumor grade and ethnoracial group suggest cumulative physiologic burden and social/structural factors may jointly shape endometrial cancer disparities.
Aversa, I.; Abatino, A.; Isabello, A.; Gallo, R.; Isdraele, L.; Straface, T.; Zullo, F. M.; Guida, M.; Saccone, G.; Fiume, G.; Venturella, R.; Viglietto, G.; Cuda, G.; Costanzo, F.; Zullo, F.; Palmieri, C.
Show abstract
Background Endometrial cancer exhibits marked molecular and immune heterogeneity that is only partially explained by established genomic biomarkers. We investigated whether T cell receptor (TCR) repertoire architecture captures complementary dimensions of antitumor immunity beyond conventional molecular classification. Methods Paired tumor and peripheral blood samples from eight patients with molecularly characterized endometrial cancer underwent TCR repertoire profiling. Diversity, clonality, and tumor blood overlap metrics were integrated with genomic variables, including tumor mutational burden (TMB), genomic instability metric (GIM), and POLE status. Principal component analysis and correlation analyses were used to identify major dimensions of repertoire organization. Composite Immune Focusing and Immune Sharing Scores were derived to summarize dominant repertoire patterns. Results The first two principal components explained 70.1% of total repertoire variance and revealed substantial heterogeneity independent of histological subtype. TMB was strongly associated with reduced repertoire diversity and increased clonal dominance, resulting in a robust association with the Immune Focusing Score ({rho} = 0.88, p = 0.004). POLE mutated tumors occupied the extreme end of this focusing continuum. In contrast, genomic instability was associated with increased tumor blood repertoire overlap and preserved diversity, reflected by a strong correlation between GIM and the Immune Sharing Score ({rho} = 0.76, p = 0.027). The two immune scores showed minimal correlation with each other ({rho} = -0.24, p = 0.57), indicating that they capture largely independent aspects of immune organization. Conclusion Integrative analysis of TCR repertoire architecture and tumor genomics identifies distinct immunogenomic states in endometrial cancer that are not fully captured by conventional molecular classification. If validated in larger cohorts, immune focusing and immune sharing metrics may provide complementary biomarkers for patient stratification and immunotherapy-oriented precision oncology
Yerukala Sathipati, S.; Scott, H.
Show abstract
Importance: Hereditary breast and ovarian cancer (HBOC) variant carriers benefit from risk-reducing interventions, but only if identified. The extent to which carriers are clinically recognized, and whether recognition is equitable across diverse populations, is poorly characterized in a single large U.S. cohort. Objective: To estimate P/LP HBOC carrier prevalence across genetic ancestry groups, quantify documented clinical genetic testing among carriers, and evaluate ancestry and socioeconomic disparities in testing. Design, Setting, and Participants: Cross-sectional analysis of the All of Us Research Program Controlled Tier (Curated Data Repository v8/C2024Q3R9), comprising participants with short-read whole genome sequencing and linked electronic health record (EHR) and survey data. Carriers were ascertained from research genomic data independent of clinical testing. Exposures: Genetically inferred ancestry (African [AFR], Admixed American [AMR], East Asian [EAS], European [EUR], Middle Eastern [MID], South Asian [SAS]); self-reported household income and educational attainment. Main Outcomes and Measures: (1) Carrier prevalence with Wilson 95% CIs; (2) documented clinical genetic testing (procedure codes) among carriers; (3) adjusted odds of documented testing among women, by ancestry, before and after socioeconomic adjustment, using multivariable logistic regression. Results: Among 414,830 participants, P/LP HBOC carrier prevalence was 1.42% (95% CI, 1.38-1.45) overall and similar across ancestry groups (AFR 1.24%, AMR 1.32%, EAS 1.19%, EUR 1.52%, MID 1.68%, SAS 1.33%; overlapping CIs). Among 250,071 women in the testing analysis, documented clinical genetic testing was rare: only 74 of 5,878 carriers overall (1.3%) and 59 of 3,572 European-ancestry carriers (1.7%) had a documented test, with counts below reportable thresholds in all other ancestry groups. African-ancestry women had lower adjusted odds of documented testing than European-ancestry women (Model 1 adjusted odds ratio [aOR], 0.32; 95% CI, 0.27-0.39), an association that attenuated but persisted after adjustment for income and education (Model 2 aOR, 0.48; 95% CI, 0.40-0.58; P < 0.001); Admixed American women also had reduced adjusted odds (aOR, 0.71; 95% CI, 0.61-0.84). Lower income and lower education were independently and dose-dependently associated with lower testing odds (income <$25,000 aOR, 0.46; high-school education aOR, 0.54). Conclusions and Relevance: High-risk HBOC variant carriers are present across all ancestry groups at similar frequencies, yet documented clinical genetic testing was disparate in the different ancestry groups. African-ancestry women experience a testing gap that is not fully explained by socioeconomic position, implicating structural barriers in access and referral. Population-level strategies that decouple carrier identification from current referral pathways may be required to close this gap.
Walinjkar, A.
Show abstract
Background: Circulating tumour DNA (ctDNA) liquid biopsy is now established across oncology for early cancer detection, minimal residual disease surveillance, and treatment monitoring. Detection thresholds for all current ctDNA assays are derived empirically through receiver operating characteristic analysis on training cohorts - a statistically valid but theoretically uninformed approach that does not specify the minimum detectable tumour fraction given assay technical characteristics, nor identify when increasing sequencing depth ceases to provide additional clinical information. Methods: We model ctDNA detection as a binary hypothesis testing problem with Binomial-distributed mutant allele counts against a sequencing error noise floor. The Neyman-Pearson lemma is applied to derive the uniformly most powerful detector and the minimum detectable tumour fraction in closed form. The sequencing assay is modelled as a binary symmetric channel and Shannon channel capacity is calculated. Empirical validation uses n=61 data points extracted from five published peer-reviewed analytical validation studies across five independent institutions in the US and EU (2018 - 2025): Yu et al. 2022, Stetson et al. 2018, Frydendahl et al. 2023, Northcott et al. 2024, and Cheng et al. 2025. Results: The minimum detectable tumour fraction is derived in closed form as f_min approximately equal to (z_alpha + z_beta) multiplied by the square root of (epsilon divided by N), where N is sequencing depth, epsilon is the platform error rate, and z_alpha, z_beta are standard normal quantiles at the specified false positive and false negative rates. Shannon channel capacity is C = 1 minus H(epsilon) bits per read, where H(epsilon) is binary entropy. Empirical validation yields 84.3% agreement for single-locus assays. Discordance for multi-locus tumour-informed assays (NeXT Personal, duplex WGS) is consistent with the single-locus model scope and identifies the principal theoretical extension required. Conclusions: This framework provides the first formal Neyman-Pearson optimality proof for ctDNA detection, a closed-form detection limit, and a platform-independent efficiency metric for NHS and regulatory standardisation. Keywords: circulating tumour DNA; liquid biopsy; Neyman-Pearson detection; Shannon channel capacity; sequencing depth; limit of detection; minimal residual disease; signal detection theory
gahan, k.
Show abstract
Abstract Background. Area-level cancer disparities are routinely estimated from public county data in which rates based on small counts (fewer than 16 cases or deaths) are suppressed. Analysts typically drop suppressed counties (complete-case analysis). Because suppression depends on case counts tied to population size and demographic composition, this missingness may be informative, but its effect on the disparity estimate has not, to our knowledge, been quantified. Methods. In a cross-sectional ecological study of 3,143 U.S. counties (analytic sample 3,018 with computable exposure) using one frozen public release of NCI State Cancer Profiles incidence and mortality data and ACS 2018-2022 5-year data, we estimated the most- versus least-deprived ICE(race+income) quintile rate ratio (RR) and rate difference for female breast, stomach, and cervix cancers under four suppression-handling methods: complete-case, available-case, bounding, and model-based small-area estimation. We characterized which counties were erased, and, following the ADEMP framework, ran a Monte Carlo simulation (1,000 replicates per cell; Monte Carlo standard error of bias approximately 0.0025) calibrated to the release to measure bias against a known truth. Analyses were pre-registered. Results. The suppressed fraction rose with rarity: 7.4% of counties for breast, 61.3% for stomach, and 75.7% for cervix incidence. Suppression was concentrated in the most-deprived quintile (cervix, 81.8% suppressed vs 63.8% least-deprived) and overwhelmingly removed rural rather than minority residents (cervix: 81% of the rural but 9% of the minority population erased). For breast (little suppression) the RR was 0.87 (95% CI 0.85-0.89) and identical across methods; for cervix incidence the complete-case RR (1.56) exceeded the model-based estimate (1.50), and for cervix mortality (91% suppressed) complete-case (1.86) exceeded model-based (1.56) by 16% with a wide bounding interval (1.88-2.62). In calibrated simulation, population-weighted complete-case bias was small (less than 2%) at the observed deprivation-county-size correlation and grew with rarity, threshold, and unweighted aggregation; its direction was conditional, becoming positive (over-estimation) as deprived counties became smaller. Conclusions. Complete-case handling of suppressed counties over-estimates rare-cancer area disparities relative to methods that retain them, while silently erasing most of the rural and most-deprived communities the estimate is meant to represent. The effect is negligible for common cancers and grows with rarity. Public-data disparity analyses should report the suppressed fraction and use bounded or model-based estimates by default. Keywords: cancer disparities; small-count suppression; Index of Concentration at the Extremes; informative missingness; small-area estimation; rural health.
Feierabend, S.; Künstner, A.; Forster, M.; Helbing, T.; Gebauer, N.; Gemoll, T.; Axt, F.; Nimmagadda, S. C.; Ranganathan, L.; Schwandt, J.; Heber, M.; Szymczak, S.; Hohensee, I.; Fliedner, S. M. J.; Scherer, F.; Oberländer, M.; Derer-Petersen, S.; Busch, H.; von Bubnoff, N.; Dazert, E.
Show abstract
Cancer treatment has shifted toward personalized therapy based on molecular profiling, particularly in advanced disease. Existing circulating tumor DNA panels are often broad, generating many non-actionable variants and incurring costs that limit routine use in molecular tumor boards. We developed and validated a manufacturer-independent, 109-gene liquid biopsy-centered pan-cancer open next generation sequencing panel (LION panel), combined with an in-house bioinformatic pipeline to support clinical decision-making. A total of 87 samples were analyzed, including 17 reference samples, 21 healthy blood donor controls, and 49 patient samples including nine tumor entities. The LION panel achieved 92% sensitivity and 99% specificity in reference samples, with high concordance to digital droplet PCR (r = 0.99). It detected variant allele frequencies as low as 0.05% (tumor-informed) and 0.5% (tumor-uninformed). Clinical concordance reached 82% with blood-based digital droplet PCR and 75% with whole exome tissue sequencing. In representative cases, variant dynamics correlated with disease progression and revealed additional targetable variants. Overall, the LION panel supports clinical decision-making by enabling identification of targetable variants, disease monitoring, and detection of treatment resistance, particularly when tumor tissue is unavailable.
King, D. W.; King, P. E.; Blanchard, M. W.; Ning, N. W.; King, S. K.; Grimm, M. C.; Ha, T.; Eagar, K.
Show abstract
Objective To determine if it is possible to assess individual patient risk of the development of colorectal cancer (CRC) in people in high-risk groups due to their family history. Design/Method Retrospective observational study of prospectively collected data from consecutive patients referred for a colonoscopy. 2,478 consecutive patients were referred to a single colorectal surgical practice in Sydney, Australia between 1977 and 2018 for a colonoscopy because of a family history of CRC. Of these, 1,963 have been followed for more than 10 years and are the subject of this paper. Histopathological findings categorised as normal (N), non-advanced adenoma (NAA) or advanced neoplasia (AN) with AN proven to be the precursor to CRC. Intervention Colonoscopic screening on the basis of contemporary practice to 2006 and subsequently according to Australian National Health and Medical Research Council guidelines. Results Participants with normal or low-risk findings in the first decade remain at lower risk of CRC for 30 years from the commencement of screening. Conclusion It is possible to stratify individual patients in a high relative risk cohort into those with high or low personal risk of CRC based on colonoscopic findings in the first 10 years of surveillance. Those with no AN in the first ten years have a lower 30-year risk of developing AN than the general community. This offers the possibility of structuring surveillance programs around individual risk rather than group risk, lessening the need for multiple surveillance colonoscopies in the majority of such patients and improving the cost effectiveness of CRC screening at the population level.
Maciaszek, J. L.; Pastor Loyola, V.; Cain, T.; Cardenas, M.; Blackburn, P. R.; Wilkinson, M. R.; Koo, S. C.; Wu, C.-H.; Li, C.; Wang, L.; Nichols, K. E.; Klco, J. M.; Eldomery, M. K.
Show abstract
Purpose: Pathogenic or likely pathogenic (P/LP) variants are increasingly identified in genes more commonly associated with adult-onset cancer predisposition, but their prevalence and relevance to children who present with cancer remain unclear. Methods: We retrospectively analyzed 1,280 consecutive pediatric patients with cancer who underwent clinical germline sequencing, using a virtual panel, from 2021 to 2024. Genes with P/LP variants were categorized as aoCPG or pediatric-onset cancer predisposition genes (poCPG) according to cancer risk before age 18 years and pediatric surveillance recommendations. Variant relevance was adjudicated using tumor diagnosis/histopathology, immunohistochemistry, and tumor molecular features and classified as primary, secondary, or indeterminate. Results: Among 1,280 patients, 197 (15.4%) harbored 211 P/LP variants across 54 genes. Sixty-six variants (31.3%) occurred in aoCPG, 87 (41.2%) in poCPG, and 58 (27.5%) were heterozygous variants in autosomal recessive genes. Among adult-onset variants, 7 (10.6%) were primary, 54 (81.8%) secondary, and 5 (7.6%) indeterminate. Among pediatric-onset variants, 77 (88.5%) were primary and 10 (11.5%) secondary. Six patients (3 adult-onset variants; 3 pediatric-onset variants) received targeted therapy informed by germline/somatic sequencing results. Conclusion: In pediatric oncology, most variants in aoCPG are secondary rather than tumor-related findings. Tumor-informed interpretation, beyond variant classification, may improve reporting, counseling, and therapeutic decision-making
Sowunmi, A.; Agbakwuru, C.; Aje, E.; Kehinde, O.; Andero, T.; Eze, C. G.; Oshikanlu, B.
Show abstract
Background: Triple-negative breast cancer (TNBC) is an aggressive breast cancer subtype characterized by the absence of estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 expression. It is associated with limited targeted treatment options, early relapse, and a high propensity for visceral metastasis. Data describing metastatic patterns and treatment characteristics of TNBC in Nigeria remain limited. Methods: This retrospective descriptive cohort study included 869 patients with TNBC managed at the Medserve-LUTH Cancer Center, Lagos University Teaching Hospital, Nigeria between June 2019 and June 2024. Demographic, clinicopathologic, metastatic, and treatment-related data were extracted from electronic medical records. Descriptive statistics were used to summarize patient characteristics, metastatic patterns, and treatment profiles. Associations between metastatic disease and selected clinicopathologic and treatment variables were explored using Pearsons chi-square test. Complete-case analysis was applied throughout. Results: The mean age at presentation was 52.09 {+/-} 12.26 years. Most patients were married (79.1%), postmenopausal (64.3%), and of Yoruba ethnicity (56.8%). Advanced disease predominated, with Stage III and Stage IV disease accounting for 42.9% and 35.6% of cases, respectively. Invasive ductal carcinoma was the most common histologic subtype (77.0%), while Grade II tumours constituted 51.3% of graded cases. Surgery was performed in 73.1% of patients, predominantly mastectomy (70.9% of surgical procedures). Chemotherapy was administered to 83.2% of patients, most commonly anthracycline-based regimens (41.8%), while radiotherapy was delivered to 63.5% of patients, with hypofractionated schedules of 42-43 Gy in 15-16 fractions accounting for 47.2% of radiotherapy courses. Metastatic disease was documented in 32.9% of evaluable patients. Lung metastasis was the most frequent site (62.5%), followed by bone (46.3%), regional lymph node invasion (38.5%), liver (23.0%), and brain (22.6%). Tumour grade and histologic subtype were not significantly associated with metastatic disease, whereas radiotherapy exposure demonstrated a significant association with metastatic status ({chi}{superscript 2} = 10.35, p = 0.001). Conclusion: TNBC in this Nigerian cohort was characterized by advanced-stage presentation, invasive ductal predominance, extensive use of multimodality treatment, and substantial visceral metastatic burden. Lung metastasis was the most common metastatic site. These findings provide contemporary real-world data on TNBC in Nigeria and highlight the continuing need for earlier diagnosis, timely referral, and sustained investment in comprehensive cancer care services.
Pregnall, A. M.; Hornick, M. M.; Broach, R. B.; Judy, R.; DePaolo, J.; Yuan, S.; Levin, M.; Fischer, J. P.; Damrauer, S. M.; Wachtel, H.
Show abstract
Objectives: Incisional hernia (IH) affects 13-30% of people after abdominal surgery, resulting in substantial morbidity and costs. While clinical risk factors have been studied extensively, genomic risk for IH is incompletely understood. We aimed to evaluate the impact of polygenic risk scores (PRS) on IH risk prediction. Methods] We created and evaluated three PRS for abdominal hernia, ventral hernia and latent hernia susceptibility for prediction of IH in an institutional biobank. The primary outcome was defined as the diagnosis or repair of an IH based on ICD-9/10-CM/PCS and CPT codes. Clinical covariates included age, sex, body mass index (BMI), smoking status, index procedure type, and perioperative surgical site infection. A phenome-wide association study (PheWAS) was performed to assess clinical associations with increased PRS. We then tested the ability of the PRS to improve prediction for IH by modeling clinical covariates with and without PRS in patients who underwent abdominal surgery. Model performance was assessed using 10 iterations of 5-fold cross-validation to estimate Brier scores and area under the receiver operating characteristic curve (AUROC), which were compared using cross-model Bayesian analysis of variance. Results: In 55,809 subjects, assessed PRS was significantly associated with incisional, umbilical, and ventral hernia on PheWAS, with 1.19 greater odds of developing IH per 1-SD increase in PRS (95% CI: 1.13-1.25, P \< 0.001). Of 9,909 subjects who underwent qualifying abdominal surgery, 706 developed IH. In this cohort, the latent hernia susceptibility PRS was associated with a 16% increased hazard of developing IH per 1-SD increase (HR 1.16; 95% CI: 1.07-1.26; P \< 0.001). Compared to a predictive model using clinical covariates (Brier score = 0.047, 95% CI: 0.046-0.048; AUROC = 0.660, 95% CI: 0.653-0.666), addition of the PRS showed similar Brier score and AUROC estimates (Brier score = 0.047, 95% CI: 0.046-0.048; AUROC: 0.667, 95% CI: 0.661-0.673) at five years. Cross-model Bayesian analysis demonstrated \>99% probability of practical equivalence when trying to detect a difference of [≥] 0.02. Conclusion: All three PRS for hernia were independently associated with IH, suggesting that genomic factors contribute significantly to IH development. However, none of the three PRS meaningfully improved clinical IH risk prediction in patients who underwent abdominal surgery. This suggests that clinical comorbidities and surgical techniques may be equally as important as genomic architecture.
Owusu-Boaitey, N.; Meyer, M. J.; Herrera-Esposito, D.; Bottcher, L.; Lukz, M.; Cook, S.; Stoto, M. A.; Kraemer, J. D.
Show abstract
Seroprevalence surveys reveal the extent of humoral immunity against pathogens such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and under some circumstances represent cumulative incidence of prior infection. However, antibody waning - or seroreversion - biases these estimates by reducing assay sensitivity in a time-varying manner. Because assay sensitivity decays over time, naively using serosurveys can substantially bias estimates of SARS-CoV-2 cumulative incidence and fatality rates. The Bayesian assay-specific, time-varying sensitivity adjustment developed in this paper can reliably correct for this bias and account for the delay between infection and serosurvey. In seroprevalence studies conducted in the United States in 2020, adjusting for time-varying sensitivity increased cumulative incidence by up to 1.4-fold, with an adjustment of 1.08 for a national study. Our estimates contrast with a previously published 2-fold adjustment that did not account for assay design. This suggests that previous analyses overestimated cumulative incidence by applying seroreversion corrections that did not account for assay-specific effects, or underestimated cumulative incidence by not applying seroreversion corrections. These biases imply fatality rate underestimation and overestimation, respectively. Our model provides a framework for design-specific time-varying sensitivity corrections in seroprevalence surveys for other pathogens.
Agarwal, T.; Namburu, J. R.; Kachroo, P.
Show abstract
Background: Pregnancy loss has important implications for womens health. Although maternal age is a well-established risk factor, the contribution of routinely measured cardiometabolic and behavioral markers at population-scale remains incompletely characterized. Objective: To examine associations between cardiometabolic, nutritional, and behavioral risk markers and pregnancy loss among U.S. women of reproductive age. Methods: We conducted a cross-sectional analysis of 4,842 U.S. women aged 20-44 years with [≥]1 pregnancy using the National Health and Nutrition Examination Survey data (2013-2023). Pregnancy loss was defined as [≥]1 prior miscarriages. Exposures included body mass index, smoking exposure (cotinine), lipid biomarkers, vitamin D and folate, and a composite cardiometabolic-nutritional risk score. Survey-weighted logistic regression estimated adjusted odds ratios (aORs) and 95% confidence intervals, with bootstrap resampling for predictor robustness. Results: The weighted prevalence of pregnancy loss was 23%. Higher odds of pregnancy loss were associated with increasing age (aOR per year=1.02; 95% CI: 1.00-1.04), Non-Hispanic Black race (aOR=1.32; 95% CI: 1.00-1.74), overweight (aOR=1.56; 95% CI: 1.16-2.11), obesity (aOR=2.06; 95% CI: 1.39-3.05), and smoking (aOR=1.58; 95% CI: 1.19-2.10). Adverse lipid profiles, particularly elevated triglycerides (aOR=1.83; 95% CI: 1.16-2.90) and high low-density lipoprotein (aOR=2.97; 95% CI: 1.45-6.61), were independently associated with pregnancy loss. Vitamin D/folate were not stable predictors. Higher composite cardiometabolic-nutritional risk scores were observed among women with pregnancy loss (P=0.026). Conclusion: Pregnancy loss clustered with adverse cardiometabolic and behavioral risk markers in a nationally representative population. These findings highlight pregnancy loss as a marker of broader metabolic vulnerability supporting the need for longitudinal studies and cardiometabolic profiling to inform preconception care and risk stratification.
Jensen, T. D.; Kaur, R.; Bonner, D. E.; Nguyen, J.; Reuter, C. M.; Undiagnosed Diseases Network, ; Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium, ; Ashley, E. A.; Bernstein, J. A.; Wheeler, M. T.; Montgomery, S. B.
Show abstract
Background: Aberrant DNA methylation can mediate the functional effects of rare genetic variation and contribute to imprinting disorders, repeat expansion diseases, and other pathogenic regulatory mechanisms. Long-read sequencing technologies now enable genome-wide detection of CpG methylation alongside genetic variation from a single assay. However, methods for systematic identification and interpretation of methylation outliers from long-read sequencing data remain limited. Methods: We developed METAFORA, a computational workflow for detecting methylation outlier regions from PacBio and Oxford Nanopore long-read sequencing data. METAFORA constructs population-level methylation references, segments the genome into correlated CpG blocks, infers technical and biological sources of variation through hidden factor estimation, models uncertainty due to variable depth sequencing, and computes covariate-adjusted methylation outlier scores for individual samples. We applied METAFORA across large long-read sequencing cohorts and integrated methylation outliers with multi-omic data. METAFORA is implemented as a snakemake workflow available at https://github.com/tjense25/METAFORA. Results: METAFORA identified methylation outlier regions associated with rare structural variants, tandem repeat expansions, and imprinting abnormalities. We found outlier regions were enriched for molecular outliers across transcriptomic and chromatin accessibility datasets, supporting their functional relevance in gene regulation. In a representative case, METAFORA identified an imprinting defect affecting the GNAS locus associated with an STX16 deletion. Conclusions: METAFORA enables scalable detection and interpretation of methylation outliers from long-read sequencing data and provides a framework for integrating epigenetic outliers with genomic and multi-omic analyses. These approaches may improve interpretation of rare regulatory variation and support discovery of clinically relevant epigenetic abnormalities in genomic medicine.
Diaz, F. C.; Waldrup, B.; Carranza, F. G.; Manjarrez, S.; Velazquez-Villarreal, E.
Show abstract
Background: Pancreatic ductal adenocarcinoma (PDAC) is characterized by extensive molecular complexity, profound stromal remodeling, and limited responsiveness to systemic therapies. Although gemcitabine-based regimens remain widely utilized, the molecular pathways that influence treatment-associated biological variation are incompletely understood. The TGF{beta} and JAK/STAT signaling networks are recognized regulators of tumor progression, immune modulation, and therapeutic resistance; however, their genomic architecture in clinically stratified PDAC populations remains poorly defined. Methods: We employed a conversational artificial intelligence-driven analytical framework to investigate TGF{beta} and JAK/STAT pathway alterations in a cohort of 184 PDAC patients. Clinical and molecular data were integrated to generate age- and treatment-stratified cohorts, enabling pathway-level and gene-level analyses according to gemcitabine exposure. Findings generated through AI-assisted interrogation were subsequently evaluated using conventional statistical approaches. Results: TGF{beta} pathway alterations were identified in approximately one-quarter to one-third of tumors across clinical subgroups and demonstrated relatively stable frequencies regardless of age at diagnosis or gemcitabine treatment status. Gene-level analyses revealed that pathway disruption was predominantly driven by recurrent alterations in SMAD4, with additional low-frequency events involving TGFBR1 and TGFBR2. Notably, TGFBR2 mutations were significantly more frequent among late-onset PDAC patients receiving gemcitabine compared with untreated late-onset patients (8.8% vs. 1.4%; p = 0.04), suggesting a potential treatment-associated enrichment. In contrast, JAK/STAT pathway alterations were rare throughout the cohort, with only isolated mutations observed in pathway components including JAK1, JAK2, JAK3, STAT1, STAT3, and related regulatory genes. No significant differences in JAK/STAT alteration frequencies were identified according to age or treatment exposure. Conclusions: TGF{beta} and JAK/STAT pathways exhibit distinct genomic architectures in PDAC. TGF{beta} pathway disruption represents a recurrent feature of disease biology, largely driven by SMAD4 alterations, while TGFBR2 enrichment in gemcitabine-treated late-onset tumors suggests a potential context-specific association worthy of further investigation. Conversely, genomic alterations within the JAK/STAT pathway are uncommon, indicating that pathway activity may be regulated predominantly through non-genomic mechanisms. These findings demonstrate the utility of conversational artificial intelligence agents for rapid, scalable, and clinically contextualized pathway interrogation and support future studies integrating multi-omic data to refine precision medicine strategies in PDAC.
Chen, F.; You, R.; Liu, Y.; Yin, Y.; Liu, A.; Deng, L.; Xie, B.; Fan, J.; Wang, W.
Show abstract
Background and Aims: MASLD has become the most prevalent chronic liver disease globally. Although MVPA and plasma fatty acids have been individually studied in relation to metabolic health, their independent and combined associations with MASLD incidence remain unclear. We aimed to investigate these associations. Methods: This study included 51,717 UK Biobank participants free of liver disease at baseline, with MVPA measured using wrist-worn accelerometers and plasma fatty acids quantified via NMR. Multivariable-adjusted Cox models and restricted cubic splines were used. Results: Over a median follow-up of 7.8 years, 472 incident cases were identified. In fully adjusted models, meeting recommended MVPA levels together with higher n-6 PUFA concentrations was associated with a 71% lower risk (HR 0.29, 95% CI 0.18-0.45). The MVPA-MASLD association was nonlinear, with risk reduction plateauing at approximately 189 minutes per week. Higher n-6 PUFA was associated with reduced risk, whereas n-3 PUFA showed no significant association. Conclusions: These findings suggest that behavioral and metabolic factors may jointly influence MASLD risk. Further studies in diverse populations are needed to confirm these associations.
Tredget, G.; Milenova, M.; Parkash, R.; McGrath, R.; Edwards, M. J.; Gee, S.; Pigg, W.; Karwacki, D.; Costa, C.; Shafique, S.; Adams, M.; Waghorn, J.; I'Anson, D.; Ronaldson, A.; Haire, K.; Githuku, C.; Beveridge, E.; Williams, J.
Show abstract
Background: Adults with severe mental health conditions (often referred to as severe mental illness, SMI) experience 15 to 20 year mortality gap relative to the general population, with lung cancer a significant contributor. National cancer policy targets earlier diagnosis but does not explicitly address how pathways function for this group. Aims: This study aimed to describe lung cancer risk, prevalence, screening eligibility, referral activity and diagnostic pathway performance for adults with SMI in South East London (SEL), and to examine where along the pathway inequalities arise. Methods: Co-designed with experts with lived experience and voluntary sector, this exploratory mixed-methods service evaluation combined quantitative analysis of routinely collected data from the Quality Outcomes Framework (QOF), SMI Register and Cancer Waiting Times Record (April 2023-March 2024) with semi-structured qualitative interviews (n=11 clinical staff) and focus groups (n=6 adults with lived experience of SMI). Quantitative and qualitative data were analysed using descriptive statistics and framework-based thematic analysis respectively, and findings were integrated using a joint display approach, organised by the Consolidated Framework for Implementation Research (CFIR). Results: Lung cancer prevalence was approximately double among adults with SMI (0.17% vs 0.09% in the general population). Despite Urgent Suspected Cancer (USC) referral rates being more than twice as high in the SMI population (63 vs 28 per 100,000), fewer cancers were detected via planned general practice (GP) routes (11% vs 20%), the 28-day Faster Diagnosis Standard was not met for any SMI patient diagnosed with lung cancer during the study period; overall FDS performance was 76% in the SMI population compared with 84% in the general population; and appointment non-attendance was more than double that in the general population (6% vs 3%). Qualitative findings identified individual, service and system-level mechanisms, including stigma, diagnostic overshadowing, fragmented coordination, and rigid pathway protocols, that compound disadvantage across lung cancer pathway stages. Conclusions: Inequality in lung cancer outcomes for adults with SMI accumulates across the pathway rather than arising at a single point of failure. Addressing this requires proportionate adaptations within existing cancer pathways, alongside routine reporting of cancer outcomes stratified by SMI population. Keywords: severe mental health conditions, lung cancer, health inequalities, cancer screening, diagnostic pathway, mixed methods
Li, Q.; Xu, L.; Wang, J.; Li, C.; Wen, W.; Shu, X.; Yang, Y.; Shu, X.-o.; Cai, Q.; Long, J.; Singh, B.; Lau, K. S.; Yin, Z.; Casey, G.; Song, M.; Peters, U.; Zheng, W.; Guo, X.
Show abstract
Bulk tissue-based DNA methylation-wide (MWAS) and transcriptome-wide association studies (TWAS) have identified CpG sites and genes associated with colorectal cancer (CRC) risk, but do not account for cellular heterogeneity. To address this, we developed a deconvolution-informed framework to infer cell-type specific DNA methylation and gene expression profiles from bulk normal colon tissues using reference single-cell epigenomic and transcriptomic datasets. We performed cell-type specific MWAS (ctMWAS) using deconvoluted DNA methylation data from 293 normal colon samples and conducted cell-type specific TWAS (ctTWAS) using deconvoluted gene expression data from 707 normal colon samples. Genetically predicted methylation and expression models were integrated with CRC GWAS summary statistics (78,473 cases and 107,143 controls) to identify risk-associated CpG sites and genes. Through ctMWAS, ctTWAS, and colocalization analyses, we identified 178 significant cell-type-specific CpG sites in 106 loci and 68 risk genes in 40 loci, including 26 previously unreported loci. Through additional integrative methylation-gene analysis, we prioritized 132 candidate risk genes, the majority of which were supported by multi-omics evidence and stage-specific dysregulation across the adenoma-carcinoma and serrated-carcinoma progression pathways. Pathway enrichment analyses implicated pathways involved in DNA double-strand break repair, TP53 regulation, TGF-{beta} signaling, and innate immune responses. Among prioritized genes, 14 were identified as putative druggable targets linked to 90 FDA-approved or clinical-stage drugs. Experimental validation supports an oncogenic role for SF3A3. These findings demonstrate that deconvolution-informed integrative analyses enable cell-type-resolved identification of epigenetic and transcriptional mechanisms underlying CRC susceptibility and provide insights into disease biology, prevention, and therapeutic target discovery.
Wang, M.; Zhao, T.; Wang, H.; Hou, S.; Fu, Y.
Show abstract
Introduction: To investigate the epidemiological characteristics of chronic kidney diseases (CKD) in China in 2021 and its trends between 1990 and 2021, in the context of significant population growth and lifestyle changes over the past 30 years that have likely influenced the CKD spectrum. Methods: Data on CKD prevalence, mortality, disability-adjusted life-years (DALY), and risk factors were obtained from the Global Burden of Disease Study 2021. The estimated decadal percentage changes were calculated to evaluate changes in trends in prevalence, mortality and disease burden. Results: In 2021, an estimated 118.4 (95% UI 109.4 to 127.5) million people in China were affected by CKD, contributing to 204 230 (95% UI 164 736 to 246 372) deaths and 6.13 (95% UI 5.18 to 7.21) million DALY. Although CKD due to diabetes mellitus and hypertension accounted for less than a quarter of all cases, they were responsible for over 90% of CKD-related deaths. Over the past three decades, CKD mortality and DALY rates have steadily increased, although the prevalence has stabilized in the last decade. Diabetes mellitus type 2 and hypertension have emerged as key drivers of CKD burden in China. Conclusions: The CKD burden in China shows a dual pattern of rising incidence and high mortality from diabetes and hypertension-related chronic kidney disease, alongside persistently high years lived with disability from glomerulonephritis and other causes.
Tahir, W.; Shamshoian, J.; Tauber, J.; Clinton, L. K.; Griffin, M.; Shah, C.; Singh, G.; Fahy, D.; Sucipto, K.; Brosnan-Cashman, J.; Altepeter, T. A.; Bhattacharya, S.; Crandall, W.; Duan, C.; Gale, J. D.; Gupta, V.; Haarmann, H.; Harpaz, N.; Hooper, A. T.; Horowitz, J.; Hurtado-Lorenzo, A.; Hussaini, B. E.; Jairath, V.; Jones, A.; Kostiuk, B.; Kukreja, A.; Laroux, F. S.; Lissoos, T.; McBride, R. B.; Najdawi, F.; Nayyar, A.; Osterman, M. T.; Panchal, P.; Ruane, D.; Travis, S.; Visvanathan, S.; Wilson, L.; Jayson, C.
Show abstract
In clinical trials for ulcerative colitis (UC), pathologists assess disease severity through standardized histological indices, including the Geboes Score, Robarts Histopathology Index (RHI), and Nancy Histologic Index (NHI). Despite strong associations with clinical outcomes, histologic scoring suffers from inter- and intra-reader variability, and consensus criteria for histologic remission remain uncertain. Through a consortium approach, we developed an artificial intelligence-based measurement (AIM) tool for scoring histology in UC mucosal biopsies (AIM-HI UC). This model, trained on a large dataset of UC biopsies (N=10,230), utilizes additive multiple instance learning models leveraging PLUTO, a pathology foundation model, that predict each of the Geboes subgrades, from which the Geboes grade-level score, RHI, and NHI can be calculated. Evaluation of this model on a standalone verification set including clinical trial specimens established algorithm non-inferiority and/or superiority relative to standard qualified pathologists through comparison of algorithm-consensus and pathologist-consensus agreement metrics (non-inferior if difference >-0.1, superior if difference >0, inclusive of confidence intervals). AIM-HI UC was determined to be non-inferior to pathologists (N=3) for the prediction of all seven Geboes subgrades, grade-level Geboes, RHI, NHI, histologic improvement (GS<3.1), 2A histologic remission (GS<2A.0), and 2B histologic remission (GS<2B.0). AIM-HI UC was superior to pathologists for several Geboes subgrades (GS 0, GS 1, GS 2B, and GS 5), as well as grade-level Geboes, RHI, and positive percent agreement of 2A histologic remission. The model was shown to be greater than 99% repeatable for all histologic scoring metrics examined. Model-derived scores were shown to strongly correlate with canonical histologic features of inflammation, including the proportion of total epithelium that is inflamed (Spearman r=0.83; p<0.01), the proportion of neutrophils localized within crypt epithelium (Spearman r=0.83, p<0.01), and the amount of mucosal area classified as erosion or ulceration (Spearman r=0.80, p<0.01). Overall, these results suggest that AIM-HI UC has the potential to improve consistency of UC histology interpretation, providing a path toward standardization of UC histology scoring in clinical trials.
Kendzerska, T.; Reyes, J.; Poirier, N.; Poirier, A.; Cull, A.; Murkar, A.; Saymeh, M.; Belanger, S.; Williams, M.; Shlik, J.; Jetly, R.; Robillard, R.
Show abstract
Background Evidence on factors associated with cannabis for medical purposes (CMP) authorizations among Veterans Affairs Canada (VAC) clients remains limited and inconsistent, particularly concerning mental health and posttraumatic stress disorder (PTSD), a leading indication for use. We investigated demographic, clinical and service characteristics associated with VAC authorizations for CMP reimbursement. Method We linked VAC administrative CMP program data with responses from the 2019 Life After Services Studies cross-sectional survey of Regular Force veterans released between 1998 and 2018. Multivariable logistic regressions examined associations between CMP reimbursement (yes/no) and demographic, clinical and well-being factors, with analyses stratified by PTSD status. Results Among 1,289 respondents (weighted n=33,131), 18.4% were authorized for CMP reimbursement. Younger age (<40 vs. [≥]60 years: OR 4.78, 95% CI: 2.24-10.21), unemployment with inability to work vs. employed (OR 3.10, 95% CI: 1.78-5.40), land service vs. air (OR 2.07, 95% CI: 1.22-3.50), PTSD (OR 2.81, 95% CI: 1.69-4.66), anxiety (OR 2.32, 95% CI: 1.45-3.70), and severe pain vs. no pain (OR 3.61, 95% CI: 1.97-6.60) were independently associated with authorization. Unemployment and severe pain were consistent correlates across PTSD strata. Among those without PTSD, younger age, multiple physical conditions, and frequent mental health visits were significant; among those with PTSD, shorter service, witnessing destruction, and suicidal ideation were additional factors. Conclusions CMP authorization patterns among Canadian veterans reflect the intersection of mental health, pain, and functional impairment, with variation by PTSD status. These findings underscore the need for longitudinal research on CMP mechanisms, effectiveness and safety.